Survival of the Fittest: Cox Regression

Capstone 6940

Hazardous Conditions: Kayla Boyd & Hermela Shimelis

Intro to Cox Regression

  • Cox regression is a popular regression modeling method used to predict survival rates given certain covariates.

    • “Survival” can refer to the development of a symptom, time to relapse after remission, or as a time to death [1].

    • Cox regression model is based on the hazard function h_x(t) with covariates at time t given by:

      h_x(t)=h_0(t)\exp(\beta_1x_1 +\beta_2x_2 + \dots + \beta_p x_n) h_x(t) = hazard \ function\ h_0(t) = Baseline \ hazard \ function\

      \beta_1x_1 +\beta_2x_2 + \dots + \beta_p x_n = linear\ combination\ of \ covariates\ and\ their\ coefficients


Methods: Making Sense of Cox Regression

  • The assumption of a constant relationship between dependent and explanatory variables is called proportional hazards [2].

  • The hazard function is the probability that an individual will experience an event (death) within a certain time interval [1].

    • If the risk factor is binary, the result can be interpreted as the estimated increase in HR in patients with the risk factor vs. those without [3].

    • If the risk factor is continuous, the HR is interpreted as an increase/decrease in the hazard rate of death due to a 1 unit increase in the variable [3].

Graphical strategies to assess proportionality assumption:

  • Kaplan-Meier Survival Distribution S(t): Plot (S(t)) as a function of survival time for each level covariate.

  • Plot the function (\log(-\log(S(t)))) as a function of the log survival time (log represent natural log).

  • Schoenfeld Residuals

  • Failing to meet the assumption of proportional hazards means that the effects between dependent and explanatory variables are not constant over time.

  • Time-varying covariates (coefficients) are used when a covariate changes over time during the follow-up period [4].

    • Example: The effect of the size of a patient’s tumor on their chances of survival.
  • Internal time-varying coefficients are affected by survival status and include values that are generated by the subject [4].

    • A patient’s blood pressure levels during a study on cardiovascular events.
  • External time-varying coefficients are pre-determined and external to the subject under study [4].

    • Pollen levels during a study on patients with asthma.

Analysis: Corporate Survival

By Kayla Boyd

Data Source: Employee Turnover Data from Kaggle

Variable Description
stag Experience (time)
event Employee turnover
gender Employee’s gender
age Employee’s age (year)
industry Employee’s industry
profession Employee’s profession
traffic From what pipelene employee came to the company.
coach Presence of a coach (training) on probation
head_gender Head (supervisor) gender
greywage The salary does not seem to the tax authorities.
way Employee’s way of transportation
extraversion Extroversion score
independ Independence score
selfcontrol Self control score
anxiety Anxiety score
novator Innovator score
Number of missing values : 0
Number of duplicates:  13

49.82079 % of observations are censored

Exploratory Data Analysis

Show Code
# Categorical covariates

# Selection of covariates (that are only discrete)
turnover.cat <- turnover %>% 
                select_if(is.factor) %>% 
                mutate(event = turnover$event)

turnover.cat %>% ggplot(aes(x = gender, color = factor(event),
                            fill = factor(event))) +
  geom_bar(alpha = 0.5) +
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") +
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover.cat %>% ggplot(aes(x = industry, color = factor(event),
                            fill = factor(event))) +
  geom_bar(alpha = 0.5) + 
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") +
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover.cat %>% ggplot(aes(x = profession, color = factor(event),
                            fill = factor(event))) +
  geom_bar(alpha = 0.5) + 
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") +
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover.cat %>% ggplot(aes(x = traffic, color = factor(event),
                            fill = factor(event))) +
  geom_bar(alpha = 0.5) + 
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") +
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover.cat %>% ggplot(aes(x = coach, color = factor(event),
                            fill = factor(event))) +
  geom_bar(alpha = 0.5) + 
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") +
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover.cat %>% ggplot(aes(x = head_gender, color = factor(event),
                            fill = factor(event))) +
  geom_bar(alpha = 0.5) + 
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") +
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover.cat %>% ggplot(aes(x = greywage, color = factor(event),
                            fill = factor(event))) +
  geom_bar(alpha = 0.5) + 
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") +
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover.cat %>% ggplot(aes(x = way, color = factor(event), 
                                     fill = factor(event))) +
  geom_bar(alpha = 0.5) + 
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") +
  theme_minimal() + theme(legend.position = "top")

Show Code
# Continuous covariates
turnover %>% ggplot(aes(x = stag, color = factor(event), 
                        fill = factor(event))) +
  geom_histogram(aes(y = ..density..), alpha = 0.5) + 
  geom_density(alpha = 0.05) +  
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") +
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover %>% ggplot(aes(x = event, color = factor(event),
                        fill = factor(event))) +
  geom_histogram(aes(y = ..density..), alpha = 0.5) + 
  geom_density(alpha = 0.05) +
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") + 
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover %>% ggplot(aes(x = age, color = factor(event),
                        fill = factor(event))) +
  geom_histogram(aes(y = ..density..), alpha = 0.5) +
  geom_density(alpha = 0.05) +
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") + 
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover %>% ggplot(aes(x = extraversion, color = factor(event), 
                        fill = factor(event))) + 
  geom_histogram(aes(y = ..density..), alpha = 0.5) + 
  geom_density(alpha = 0.05) + scale_color_brewer(palette = "Dark2") + 
  scale_fill_brewer(palette = "Dark2") + theme_minimal() + 
  theme(legend.position = "top")

Show Code
turnover %>% ggplot(aes(x = independ, color = factor(event), 
                        fill = factor(event))) +
  geom_histogram(aes(y = ..density..), alpha = 0.5) + 
  geom_density(alpha = 0.05) + 
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") + 
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover %>% ggplot(aes(x = selfcontrol, color = factor(event),
                        fill = factor(event))) +
  geom_histogram(aes(y = ..density..), alpha = 0.5) +
  geom_density(alpha = 0.05) + 
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") +
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover %>% ggplot(aes(x = anxiety, color = factor(event),
                        fill = factor(event))) +
  geom_histogram(aes(y = ..density..), alpha = 0.5) +
  geom_density(alpha = 0.05) + 
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") +
  theme_minimal() + theme(legend.position = "top")

Show Code
turnover %>% ggplot(aes(x = novator, color = factor(event),
                        fill = factor(event))) +
  geom_histogram(aes(y = ..density..), alpha = 0.5) +
  geom_density(alpha = 0.05) + 
  scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Dark2") +
  theme_minimal() + theme(legend.position = "top")

Show Code
# Other encoding of discrete variables
turnover.num <- turnover.cat[-9]

levels(turnover.num$gender) <- 1:length(levels(turnover.cat$gender))
levels(turnover.num$industry) <- 1:length(levels(turnover.cat$industry))
levels(turnover.num$profession) <- 1:length(levels(turnover.cat$profession))
levels(turnover.num$traffic) <- 1:length(levels(turnover.cat$traffic))
levels(turnover.num$coach) <- 1:length(levels(turnover.cat$coach))
levels(turnover.num$head_gender) <- 1:length(levels(turnover.cat$head_gender))
levels(turnover.num$greywage) <- 1:length(levels(turnover.cat$greywage))
levels(turnover.num$way) <- 1:length(levels(turnover.cat$way))
turnover.num <- as.data.frame(apply(turnover.num, 2, as.numeric))

turnover.num <- cbind(turnover.num, turnover %>%
    select_if(is.numeric))

# Correlation matrix
corrplot(cor(turnover.num), col = brewer.pal(10, "BrBG"), method = "square", diag = FALSE)

Survival Plots

Now we will plot our categorical variables (gender, industry, profession, traffic, coach, head_gender, greywage, and way) using the Kaplan-Meier Estimation Curve. The Kaplan-Meier allows us to visualize how each category of survival probability differs from the others.

Show Code
km_gender<-survfit(Surv(stag, event)~gender, data = turnover, type="kaplan-meier")

ggsurvplot(km_gender, data=turnover,
           conf.int = FALSE,
           ggtheme = theme_minimal(),
           legend.labs = c("female", "male"),
           pval = TRUE,
           pval.method = TRUE)+
  ggtitle("Survival curve based on Gender")

Show Code
km_industry<-survfit(Surv(stag, event)~industry, data = turnover, type ="kaplan-meier")
ggsurvplot(km_industry, data=turnover,
           conf.int = FALSE,
           ggtheme = theme_minimal(),
           pval = TRUE,
           pval.method = TRUE)+
  ggtitle("Survival curve based on Industry")

Show Code
km_profession<-survfit(Surv(stag, event)~profession, data = turnover, type="kaplan-meier")
ggsurvplot(km_profession, data=turnover,
           conf.int = FALSE,
           ggtheme = theme_minimal(),
           pval = TRUE,
           pval.method = TRUE)+
  ggtitle("Survival curve based on Profession")

Show Code
km_traffic<-survfit(Surv(stag, event)~traffic, data= turnover, type="kaplan-meier")
ggsurvplot(km_traffic, data=turnover,
           conf.int = FALSE,
           ggtheme = theme_minimal(),
           pval = TRUE,
           pval.method = TRUE)+
  ggtitle("Survival curve based on Traffic")

Show Code
km_coach<-survfit(Surv(stag, event)~coach, data = turnover, type="kaplan-meier")
ggsurvplot(km_coach, data=turnover,
           conf.int = FALSE,
           ggtheme = theme_minimal(),
           pval = TRUE,
           pval.method = TRUE)+
  ggtitle("Survival curve based on Coach")

Show Code
km_headgender<-survfit(Surv(stag, event)~head_gender, data= turnover, type="kaplan-meier")
ggsurvplot(km_headgender, data=turnover,
           conf.int = FALSE,
           ggtheme = theme_minimal(),
           legend.labs = c("female", "male"),
           pval = TRUE,
           pval.method = TRUE)+
  ggtitle("Survival curve based on Head Gender")

Show Code
km_greywage<-survfit(Surv(stag, event)~greywage, data = turnover, type ="kaplan-meier")
ggsurvplot(km_greywage, data=turnover,
           conf.int = FALSE,
           ggtheme = theme_minimal(),
           legend.labs = c("grey", "white"),
           pval = TRUE,
           pval.method = TRUE)+
  ggtitle("Survival curve based on Greywage")

Show Code
km_way<-survfit(Surv(stag, event)~way,
                data= turnover,
                type="kaplan-meier")
ggsurvplot(km_way, data=turnover,
           conf.int = FALSE,
           ggtheme = theme_minimal(),
           legend.labs=c("bus", "car", "foot"),
           pval = TRUE,
           pval.method = TRUE)+
  ggtitle("Survival curve based on Commuters(way)")

Modeling the Outcomes

Show Code
model0<-coxph(Surv(stag, event)~.,
                 data = turnover)
summary(model0)
Call:
coxph(formula = Surv(stag, event) ~ ., data = turnover)

  n= 1116, number of events= 560 

                                   coef exp(coef)  se(coef)      z Pr(>|z|)    
genderm                       -0.110567  0.895327  0.127236 -0.869 0.384855    
age                            0.021321  1.021550  0.006992  3.050 0.002292 ** 
industryBanks                 -0.228597  0.795649  0.370386 -0.617 0.537112    
industryBuilding              -0.228859  0.795441  0.394945 -0.579 0.562272    
industryConsult               -0.387826  0.678530  0.378031 -1.026 0.304934    
industryetc                   -0.571840  0.564486  0.376067 -1.521 0.128365    
industryHoReCa                -0.641488  0.526508  0.545700 -1.176 0.239782    
industryIT                    -1.188633  0.304637  0.392874 -3.025 0.002482 ** 
industrymanufacture           -0.799654  0.449484  0.373800 -2.139 0.032415 *  
industryMining                -0.603336  0.546984  0.449444 -1.342 0.179465    
industryPharma                -1.005353  0.365916  0.480887 -2.091 0.036562 *  
industryPowerGeneration       -1.032475  0.356124  0.451676 -2.286 0.022261 *  
industryRealEstate            -1.725592  0.178068  0.588426 -2.933 0.003362 ** 
industryRetail                -1.042228  0.352668  0.363732 -2.865 0.004165 ** 
industryState                 -0.667077  0.513206  0.410168 -1.626 0.103875    
industryTelecom               -1.186918  0.305160  0.448385 -2.647 0.008119 ** 
industrytransport             -0.852135  0.426504  0.427996 -1.991 0.046482 *  
professionBusinessDevelopment  0.600890  1.823741  0.508792  1.181 0.237597    
professionCommercial           0.998866  2.715201  0.507452  1.968 0.049022 *  
professionConsult              0.570631  1.769384  0.520478  1.096 0.272921    
professionEngineer             0.998848  2.715151  0.538726  1.854 0.063726 .  
professionetc                  0.486204  1.626132  0.486495  0.999 0.317600    
professionFinance              0.054707  1.056231  0.526353  0.104 0.917220    
professionHR                   0.202198  1.224091  0.429582  0.471 0.637865    
professionIT                   0.069725  1.072213  0.491667  0.142 0.887228    
professionLaw                  0.403392  1.496894  0.647872  0.623 0.533520    
professionmanage               1.284135  3.611541  0.500159  2.567 0.010245 *  
professionMarketing            0.726077  2.066956  0.482551  1.505 0.132411    
professionPR                   0.846313  2.331035  0.640073  1.322 0.186097    
professionSales                0.505293  1.657472  0.467728  1.080 0.280002    
professionTeaching             0.617169  1.853673  0.569441  1.084 0.278446    
trafficempjs                   0.928219  2.529999  0.314661  2.950 0.003179 ** 
trafficfriends                 0.122161  1.129937  0.342264  0.357 0.721151    
trafficKA                      0.141661  1.152186  0.353556  0.401 0.688659    
trafficrabrecNErab             0.548112  1.729984  0.310084  1.768 0.077124 .  
trafficrecNErab               -0.051263  0.950029  0.381447 -0.134 0.893095    
trafficreferal                 0.368137  1.445041  0.325262  1.132 0.257711    
trafficyoujs                   0.654608  1.924387  0.309447  2.115 0.034395 *  
coachno                        0.056040  1.057640  0.111493  0.503 0.615221    
coachyes                       0.211061  1.234988  0.151771  1.391 0.164329    
head_genderm                   0.055167  1.056717  0.102616  0.538 0.590846    
greywagewhite                 -0.505741  0.603059  0.135383 -3.736 0.000187 ***
waycar                        -0.201625  0.817401  0.103831 -1.942 0.052153 .  
wayfoot                       -0.402833  0.668424  0.174165 -2.313 0.020726 *  
extraversion                   0.016623  1.016761  0.035460  0.469 0.639236    
independ                      -0.019380  0.980806  0.035771 -0.542 0.587962    
selfcontrol                   -0.045743  0.955287  0.035878 -1.275 0.202321    
anxiety                       -0.048800  0.952372  0.034633 -1.409 0.158827    
novator                        0.009108  1.009150  0.030631  0.297 0.766196    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                              exp(coef) exp(-coef) lower .95 upper .95
genderm                          0.8953     1.1169    0.6977    1.1489
age                              1.0216     0.9789    1.0076    1.0356
industryBanks                    0.7956     1.2568    0.3850    1.6444
industryBuilding                 0.7954     1.2572    0.3668    1.7250
industryConsult                  0.6785     1.4738    0.3234    1.4235
industryetc                      0.5645     1.7715    0.2701    1.1797
industryHoReCa                   0.5265     1.8993    0.1807    1.5343
industryIT                       0.3046     3.2826    0.1410    0.6580
industrymanufacture              0.4495     2.2248    0.2160    0.9352
industryMining                   0.5470     1.8282    0.2267    1.3199
industryPharma                   0.3659     2.7329    0.1426    0.9391
industryPowerGeneration          0.3561     2.8080    0.1469    0.8631
industryRealEstate               0.1781     5.6158    0.0562    0.5642
industryRetail                   0.3527     2.8355    0.1729    0.7194
industryState                    0.5132     1.9485    0.2297    1.1466
industryTelecom                  0.3052     3.2770    0.1267    0.7348
industrytransport                0.4265     2.3446    0.1843    0.9868
professionBusinessDevelopment    1.8237     0.5483    0.6728    4.9436
professionCommercial             2.7152     0.3683    1.0043    7.3408
professionConsult                1.7694     0.5652    0.6380    4.9074
professionEngineer               2.7152     0.3683    0.9446    7.8047
professionetc                    1.6261     0.6150    0.6267    4.2195
professionFinance                1.0562     0.9468    0.3765    2.9634
professionHR                     1.2241     0.8169    0.5274    2.8410
professionIT                     1.0722     0.9327    0.4090    2.8105
professionLaw                    1.4969     0.6681    0.4205    5.3292
professionmanage                 3.6115     0.2769    1.3551    9.6256
professionMarketing              2.0670     0.4838    0.8028    5.3221
professionPR                     2.3310     0.4290    0.6648    8.1730
professionSales                  1.6575     0.6033    0.6627    4.1455
professionTeaching               1.8537     0.5395    0.6072    5.6590
trafficempjs                     2.5300     0.3953    1.3655    4.6877
trafficfriends                   1.1299     0.8850    0.5777    2.2100
trafficKA                        1.1522     0.8679    0.5762    2.3039
trafficrabrecNErab               1.7300     0.5780    0.9421    3.1768
trafficrecNErab                  0.9500     1.0526    0.4498    2.0064
trafficreferal                   1.4450     0.6920    0.7639    2.7337
trafficyoujs                     1.9244     0.5196    1.0493    3.5293
coachno                          1.0576     0.9455    0.8500    1.3160
coachyes                         1.2350     0.8097    0.9172    1.6628
head_genderm                     1.0567     0.9463    0.8642    1.2921
greywagewhite                    0.6031     1.6582    0.4625    0.7863
waycar                           0.8174     1.2234    0.6669    1.0019
wayfoot                          0.6684     1.4961    0.4751    0.9404
extraversion                     1.0168     0.9835    0.9485    1.0899
independ                         0.9808     1.0196    0.9144    1.0520
selfcontrol                      0.9553     1.0468    0.8904    1.0249
anxiety                          0.9524     1.0500    0.8899    1.0193
novator                          1.0092     0.9909    0.9503    1.0716

Concordance= 0.661  (se = 0.012 )
Likelihood ratio test= 173.8  on 49 df,   p=7e-16
Wald test            = 176.7  on 49 df,   p=3e-16
Score (logrank) test = 183.5  on 49 df,   p=<2e-16

We want to refine the model to use the independent variable(s) with the greatest influence from our crude model. To do this, we will use the “step” function in R which iteratively adds or removes variables based on a chosen criterion. For this model we will use the AIC value. The Akaike Information Criterion (AIC) is a statistical metric that evaluates how well a model fits a given set of data.

Show Code
step(model0)
Start:  AIC=6717.67
Surv(stag, event) ~ gender + age + industry + profession + traffic + 
    coach + head_gender + greywage + way + extraversion + independ + 
    selfcontrol + anxiety + novator

               Df    AIC
- coach         2 6715.6
- novator       1 6715.8
- extraversion  1 6715.9
- head_gender   1 6716.0
- independ      1 6716.0
- gender        1 6716.4
- selfcontrol   1 6717.3
- anxiety       1 6717.7
<none>            6717.7
- profession   14 6720.5
- way           2 6721.8
- age           1 6724.8
- greywage      1 6728.3
- traffic       7 6741.6
- industry     15 6746.9

Step:  AIC=6715.58
Surv(stag, event) ~ gender + age + industry + profession + traffic + 
    head_gender + greywage + way + extraversion + independ + 
    selfcontrol + anxiety + novator

               Df    AIC
- novator       1 6713.8
- head_gender   1 6713.8
- extraversion  1 6713.8
- independ      1 6713.9
- gender        1 6714.3
- selfcontrol   1 6715.2
<none>            6715.6
- anxiety       1 6715.7
- profession   14 6718.8
- way           2 6719.2
- age           1 6722.8
- greywage      1 6726.4
- traffic       7 6739.4
- industry     15 6746.4

Step:  AIC=6713.76
Surv(stag, event) ~ gender + age + industry + profession + traffic + 
    head_gender + greywage + way + extraversion + independ + 
    selfcontrol + anxiety

               Df    AIC
- extraversion  1 6712.1
- head_gender   1 6712.1
- independ      1 6712.1
- gender        1 6712.5
- anxiety       1 6713.7
<none>            6713.8
- selfcontrol   1 6714.5
- profession   14 6716.9
- way           2 6717.3
- age           1 6721.1
- greywage      1 6724.4
- traffic       7 6737.5
- industry     15 6744.5

Step:  AIC=6712.05
Surv(stag, event) ~ gender + age + industry + profession + traffic + 
    head_gender + greywage + way + independ + selfcontrol + anxiety

              Df    AIC
- head_gender  1 6710.4
- gender       1 6710.8
- independ     1 6711.0
<none>           6712.1
- anxiety      1 6713.4
- profession  14 6715.5
- way          2 6715.6
- selfcontrol  1 6717.3
- age          1 6719.1
- greywage     1 6722.5
- traffic      7 6736.4
- industry    15 6743.3

Step:  AIC=6710.35
Surv(stag, event) ~ gender + age + industry + profession + traffic + 
    greywage + way + independ + selfcontrol + anxiety

              Df    AIC
- gender       1 6709.0
- independ     1 6709.4
<none>           6710.4
- anxiety      1 6711.9
- way          2 6713.9
- profession  14 6714.1
- selfcontrol  1 6715.9
- age          1 6719.1
- greywage     1 6720.9
- traffic      7 6734.4
- industry    15 6741.9

Step:  AIC=6709
Surv(stag, event) ~ age + industry + profession + traffic + greywage + 
    way + independ + selfcontrol + anxiety

              Df    AIC
- independ     1 6708.2
<none>           6709.0
- anxiety      1 6711.7
- profession  14 6712.1
- way          2 6712.8
- selfcontrol  1 6715.3
- age          1 6717.6
- greywage     1 6720.0
- traffic      7 6733.2
- industry    15 6740.7

Step:  AIC=6708.17
Surv(stag, event) ~ age + industry + profession + traffic + greywage + 
    way + selfcontrol + anxiety

              Df    AIC
<none>           6708.2
- anxiety      1 6709.7
- profession  14 6710.9
- way          2 6712.0
- selfcontrol  1 6713.4
- age          1 6716.1
- greywage     1 6719.4
- traffic      7 6732.4
- industry    15 6739.6
Call:
coxph(formula = Surv(stag, event) ~ age + industry + profession + 
    traffic + greywage + way + selfcontrol + anxiety, data = turnover)

                                  coef exp(coef) se(coef)      z        p
age                            0.02055   1.02076  0.00641  3.205 0.001349
industryBanks                 -0.28975   0.74845  0.36289 -0.798 0.424614
industryBuilding              -0.26582   0.76658  0.38880 -0.684 0.494175
industryConsult               -0.45691   0.63324  0.36927 -1.237 0.215967
industryetc                   -0.64403   0.52517  0.36896 -1.746 0.080890
industryHoReCa                -0.77805   0.45930  0.54017 -1.440 0.149758
industryIT                    -1.24814   0.28704  0.38553 -3.237 0.001206
industrymanufacture           -0.87933   0.41506  0.36757 -2.392 0.016744
industryMining                -0.65940   0.51716  0.44284 -1.489 0.136485
industryPharma                -1.04233   0.35263  0.47232 -2.207 0.027324
industryPowerGeneration       -1.09181   0.33561  0.44269 -2.466 0.013651
industryRealEstate            -1.81850   0.16227  0.58090 -3.130 0.001745
industryRetail                -1.10739   0.33042  0.35511 -3.118 0.001818
industryState                 -0.73784   0.47814  0.40157 -1.837 0.066154
industryTelecom               -1.25500   0.28508  0.43786 -2.866 0.004154
industrytransport             -0.86358   0.42165  0.42212 -2.046 0.040775
professionBusinessDevelopment  0.59092   1.80564  0.50153  1.178 0.238702
professionCommercial           1.00727   2.73813  0.49840  2.021 0.043279
professionConsult              0.54819   1.73012  0.50936  1.076 0.281828
professionEngineer             0.94501   2.57284  0.52648  1.795 0.072662
professionetc                  0.45756   1.58021  0.48289  0.948 0.343360
professionFinance              0.05804   1.05976  0.51797  0.112 0.910784
professionHR                   0.22086   1.24714  0.42463  0.520 0.602987
professionIT                   0.02934   1.02978  0.47418  0.062 0.950654
professionLaw                  0.31703   1.37304  0.64071  0.495 0.620736
professionmanage               1.28255   3.60582  0.49774  2.577 0.009974
professionMarketing            0.70287   2.01955  0.47822  1.470 0.141624
professionPR                   0.82179   2.27458  0.63824  1.288 0.197889
professionSales                0.50708   1.66044  0.45661  1.111 0.266774
professionTeaching             0.62625   1.87058  0.56833  1.102 0.270502
trafficempjs                   0.85644   2.35476  0.30998  2.763 0.005729
trafficfriends                 0.03493   1.03555  0.33616  0.104 0.917244
trafficKA                      0.10265   1.10810  0.34993  0.293 0.769274
trafficrabrecNErab             0.48122   1.61805  0.30601  1.573 0.115817
trafficrecNErab               -0.12679   0.88092  0.37753 -0.336 0.736996
trafficreferal                 0.30802   1.36073  0.32125  0.959 0.337642
trafficyoujs                   0.60301   1.82761  0.30666  1.966 0.049257
greywagewhite                 -0.51396   0.59812  0.13401 -3.835 0.000125
waycar                        -0.21103   0.80975  0.10259 -2.057 0.039682
wayfoot                       -0.37776   0.68539  0.17333 -2.179 0.029301
selfcontrol                   -0.06082   0.94099  0.02269 -2.680 0.007353
anxiety                       -0.04924   0.95196  0.02619 -1.880 0.060130

Likelihood ratio test=169.3  on 42 df, p=< 2.2e-16
n= 1116, number of events= 560 

Using these results, we can see that the model including independent variables age, industry, profession, traffic, greywage, selfcontrol, and anxiety produced the most accurate survival model. We will use these variables in our adjusted model.

Show Code
model1<-coxph(Surv(stag, event)~age + industry + profession + traffic + greywage + way + selfcontrol + anxiety,
                data = turnover)
summary(model1)
Call:
coxph(formula = Surv(stag, event) ~ age + industry + profession + 
    traffic + greywage + way + selfcontrol + anxiety, data = turnover)

  n= 1116, number of events= 560 

                                  coef exp(coef) se(coef)      z Pr(>|z|)    
age                            0.02055   1.02076  0.00641  3.205 0.001349 ** 
industryBanks                 -0.28975   0.74845  0.36289 -0.798 0.424614    
industryBuilding              -0.26582   0.76658  0.38880 -0.684 0.494175    
industryConsult               -0.45691   0.63324  0.36927 -1.237 0.215967    
industryetc                   -0.64403   0.52517  0.36896 -1.746 0.080890 .  
industryHoReCa                -0.77805   0.45930  0.54017 -1.440 0.149758    
industryIT                    -1.24814   0.28704  0.38553 -3.237 0.001206 ** 
industrymanufacture           -0.87933   0.41506  0.36757 -2.392 0.016744 *  
industryMining                -0.65940   0.51716  0.44284 -1.489 0.136485    
industryPharma                -1.04233   0.35263  0.47232 -2.207 0.027324 *  
industryPowerGeneration       -1.09181   0.33561  0.44269 -2.466 0.013651 *  
industryRealEstate            -1.81850   0.16227  0.58090 -3.130 0.001745 ** 
industryRetail                -1.10739   0.33042  0.35511 -3.118 0.001818 ** 
industryState                 -0.73784   0.47814  0.40157 -1.837 0.066154 .  
industryTelecom               -1.25500   0.28508  0.43786 -2.866 0.004154 ** 
industrytransport             -0.86358   0.42165  0.42212 -2.046 0.040775 *  
professionBusinessDevelopment  0.59092   1.80564  0.50153  1.178 0.238702    
professionCommercial           1.00727   2.73813  0.49840  2.021 0.043279 *  
professionConsult              0.54819   1.73012  0.50936  1.076 0.281828    
professionEngineer             0.94501   2.57284  0.52648  1.795 0.072662 .  
professionetc                  0.45756   1.58021  0.48289  0.948 0.343360    
professionFinance              0.05804   1.05976  0.51797  0.112 0.910784    
professionHR                   0.22086   1.24714  0.42463  0.520 0.602987    
professionIT                   0.02934   1.02978  0.47418  0.062 0.950654    
professionLaw                  0.31703   1.37304  0.64071  0.495 0.620736    
professionmanage               1.28255   3.60582  0.49774  2.577 0.009974 ** 
professionMarketing            0.70287   2.01955  0.47822  1.470 0.141624    
professionPR                   0.82179   2.27458  0.63824  1.288 0.197889    
professionSales                0.50708   1.66044  0.45661  1.111 0.266774    
professionTeaching             0.62625   1.87058  0.56833  1.102 0.270502    
trafficempjs                   0.85644   2.35476  0.30998  2.763 0.005729 ** 
trafficfriends                 0.03493   1.03555  0.33616  0.104 0.917244    
trafficKA                      0.10265   1.10810  0.34993  0.293 0.769274    
trafficrabrecNErab             0.48122   1.61805  0.30601  1.573 0.115817    
trafficrecNErab               -0.12679   0.88092  0.37753 -0.336 0.736996    
trafficreferal                 0.30802   1.36073  0.32125  0.959 0.337642    
trafficyoujs                   0.60301   1.82761  0.30666  1.966 0.049257 *  
greywagewhite                 -0.51396   0.59812  0.13401 -3.835 0.000125 ***
waycar                        -0.21103   0.80975  0.10259 -2.057 0.039682 *  
wayfoot                       -0.37776   0.68539  0.17333 -2.179 0.029301 *  
selfcontrol                   -0.06082   0.94099  0.02269 -2.680 0.007353 ** 
anxiety                       -0.04924   0.95196  0.02619 -1.880 0.060130 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                              exp(coef) exp(-coef) lower .95 upper .95
age                              1.0208     0.9797   1.00801    1.0337
industryBanks                    0.7485     1.3361   0.36751    1.5243
industryBuilding                 0.7666     1.3045   0.35777    1.6425
industryConsult                  0.6332     1.5792   0.30707    1.3058
industryetc                      0.5252     1.9041   0.25483    1.0823
industryHoReCa                   0.4593     2.1772   0.15933    1.3240
industryIT                       0.2870     3.4839   0.13483    0.6111
industrymanufacture              0.4151     2.4093   0.20195    0.8531
industryMining                   0.5172     1.9336   0.21711    1.2319
industryPharma                   0.3526     2.8358   0.13973    0.8899
industryPowerGeneration          0.3356     2.9797   0.14093    0.7992
industryRealEstate               0.1623     6.1626   0.05197    0.5066
industryRetail                   0.3304     3.0264   0.16474    0.6627
industryState                    0.4781     2.0914   0.21764    1.0505
industryTelecom                  0.2851     3.5078   0.12085    0.6725
industrytransport                0.4217     2.3716   0.18435    0.9644
professionBusinessDevelopment    1.8056     0.5538   0.67567    4.8254
professionCommercial             2.7381     0.3652   1.03089    7.2727
professionConsult                1.7301     0.5780   0.63753    4.6951
professionEngineer               2.5728     0.3887   0.91679    7.2203
professionetc                    1.5802     0.6328   0.61331    4.0715
professionFinance                1.0598     0.9436   0.38398    2.9249
professionHR                     1.2471     0.8018   0.54258    2.8666
professionIT                     1.0298     0.9711   0.40656    2.6084
professionLaw                    1.3730     0.7283   0.39112    4.8201
professionmanage                 3.6058     0.2773   1.35933    9.5650
professionMarketing              2.0195     0.4952   0.79103    5.1560
professionPR                     2.2746     0.4396   0.65107    7.9465
professionSales                  1.6604     0.6023   0.67850    4.0634
professionTeaching               1.8706     0.5346   0.61406    5.6983
trafficempjs                     2.3548     0.4247   1.28261    4.3232
trafficfriends                   1.0355     0.9657   0.53583    2.0013
trafficKA                        1.1081     0.9024   0.55810    2.2001
trafficrabrecNErab               1.6181     0.6180   0.88821    2.9476
trafficrecNErab                  0.8809     1.1352   0.42032    1.8463
trafficreferal                   1.3607     0.7349   0.72498    2.5540
trafficyoujs                     1.8276     0.5472   1.00196    3.3336
greywagewhite                    0.5981     1.6719   0.45996    0.7778
waycar                           0.8098     1.2349   0.66226    0.9901
wayfoot                          0.6854     1.4590   0.48798    0.9627
selfcontrol                      0.9410     1.0627   0.90005    0.9838
anxiety                          0.9520     1.0505   0.90432    1.0021

Concordance= 0.66  (se = 0.012 )
Likelihood ratio test= 169.3  on 42 df,   p=<2e-16
Wald test            = 172.2  on 42 df,   p=<2e-16
Score (logrank) test = 178.8  on 42 df,   p=<2e-16

Hazard ratios above 1 indicate a higher likelihood of the individuals in that category to experience the event (termination) in the given timeframe.

Show Code
hr=exp(model1$coefficients)
hr
                          age                 industryBanks 
                    1.0207593                     0.7484527 
             industryBuilding               industryConsult 
                    0.7665782                     0.6332371 
                  industryetc                industryHoReCa 
                    0.5251713                     0.4593025 
                   industryIT           industrymanufacture 
                    0.2870376                     0.4150602 
               industryMining                industryPharma 
                    0.5171611                     0.3526310 
      industryPowerGeneration            industryRealEstate 
                    0.3356074                     0.1622689 
               industryRetail                 industryState 
                    0.3304219                     0.4781437 
              industryTelecom             industrytransport 
                    0.2850769                     0.4216508 
professionBusinessDevelopment          professionCommercial 
                    1.8056453                     2.7381273 
            professionConsult            professionEngineer 
                    1.7301166                     2.5728377 
                professionetc             professionFinance 
                    1.5802152                     1.0597560 
                 professionHR                  professionIT 
                    1.2471440                     1.0297797 
                professionLaw              professionmanage 
                    1.3730396                     3.6058251 
          professionMarketing                  professionPR 
                    2.0195459                     2.2745773 
              professionSales            professionTeaching 
                    1.6604383                     1.8705848 
                 trafficempjs                trafficfriends 
                    2.3547649                     1.0355461 
                    trafficKA            trafficrabrecNErab 
                    1.1080976                     1.6180539 
              trafficrecNErab                trafficreferal 
                    0.8809202                     1.3607294 
                 trafficyoujs                 greywagewhite 
                    1.8276083                     0.5981229 
                       waycar                       wayfoot 
                    0.8097502                     0.6853927 
                  selfcontrol                       anxiety 
                    0.9409886                     0.9519574 

Model Comparison

Crude Model Adjusted Model
Covariate Hazard ratio (95% CI), p-value Hazard ratio (95% CI), p-value
genderm (0.70, 1.15) 0.384 Excluded
age (1.01, 1.04) 0.002 (1.01, 1.03) 0.001
industryBanks (0.39, 1.64) 0.537 (0.37, 1.52) 0.424
industryBuilding (0.37, 1.73) 0.562 (0.36, 1.64) 0.494
industryConsult (0.32, 1.42) 0.304 (0.31, 1.31) 0.215
industryetc (0.27, 1.18) 0.128 (0.25, 1.08) 0.080
industryHoReCa (0.18, 1.53) 0.239 (0.16, 1.32) 0.149
industryIT (0.14, 0.66) 0.002 (0.13, 0.61) 0.001
industrymanufacture (0.22, 0.94) 0.032 (0.20, 0.85) 0.016
industryMining (0.23, 1.32) 0.179 (0.21,1.23) 0.136
industryPharma (0.14, 0.94) 0.036 (0.14, 0.89) 0.027
industryPowerGeneration (0.15, 0.86) 0.022 (0.14, 0.80) 0.013
industryRealEstate (0.06, 0.56) 0.003 (0.05, 0.51) 0.001
industryRetail (0.17, 0.72) 0.004 (0.16, 0.66) 0.002
industryState (0.23,1.15) 0.104 (0.22, 1.05) 0.066
industryTelecom (0.13, 0.73) 0.008 (0.12, 0.67) 0.004
industrytransport (0.18, 0.99) 0.046 (0.18, 0.96) 0.041
professionBusinessDevelopment (0.67, 4.94) 0.237 (0.68, 4.83) 0.238
professionCommercial (1.00, 7.34) 0.049 (1.03, 7.27) 0.043
professionConsult (0.64, 4.91) 0.272 (0.64, 4.70) 0.281
professionEngineer (0.94, 7.80) 0.063 (0.92, 7.22) 0.072
professionetc (0.63, 4.22) 0.317 (0.61, 4.07) 0.343
professionFinance (0.377, 2.96) 0.917 (0.38, 2.92) 0.910
professionHR (0.53, 2.84) 0.638 (0.54, 2.87) 0.603
professionIT (0.41, 2.81) 0.887 (0.41, 2.61) 0.951
professionLaw (0.42, 5.33) 0.533 (0.39, 4.82) 0.621
professionmanage (1.36, 9.63) 0.010 (1.36, 9.57) 0.01
professionMarketing (0.80, 5.32) 0.132 (0.79, 5.16) 0.141
professionPR (0.66, 8.17) 0.186 (0.65, 7.95) 0.197
professionSales (0.66, 4.15) 0.280 (0.68, 4.06) 0.266
professionTeaching (0.61, 5.66) 0.278 (0.61, 5.70) 0.270
trafficempjs (1.37, 4.69) 0.003 (1.28, 4.32) 0.006
trafficfriends (0.58, 2.21) 0.721 (0.54, 2.00) 0.917
trafficKA (0.58, 2.30) 0.688 (0.56, 2.20) 0.769
trafficrabrecNErab (0.94, 3.18) 0.077 (0.89, 2.95) 0.115
trafficrecNErab (0.45, 2.01) 0.893 (0.42, 1.85) 0.736
trafficreferal (0.76, 2.73) 0.257 (0.72, 2.55) 0.337
trafficyoujs (1.05, 3.53) 0.034 (1.00, 3.33) 0.049
coachno (0.85,1.316) 0.615 Excluded
coachyes (0.92, 1.66) 0.164 Excluded
head_genderm (0.86, 1.29) 0.590 Excluded
greywagewhite (0.46, 0.79) 0.0002 (0.46, 0.78) 0.0001
waycar (0.67, 1.00) 0.052 (0.66, 0.99) 0.039
wayfoot (0.48, 0.94) 0.020 (0.49, 0.96) 0.029
extraversion (0.95, 1.09) 0.639 Excluded
independ (0.91, 1.05) 0.587 Excluded
selfcontrol (0.89, 1.02) 0.202 (0.90, 0.98) 0.007
anxiety (0.89, 1.02) 0.158 (0.90, 1.00) 0.060
novator (0.95, 1.07) 0.766 Excluded

Predictions

We will now use our adjusted model to predict 3-year survival rates between two employees, one age 25 and one age 55. All other variables will remain the same.

Show Code
new_hire <- data.frame(age = c(25,55), industry = c("IT","IT"), stag= c(36, 36), event= c(0,0), profession = c("manage","manage"), traffic = c("empjs","empjs"), greywage = c("white","white"), way = c("car","car"), independ = c(5,5), selfcontrol = c(5,5), anxiety = c(5,5))

preds <- predict(model1, newdata = new_hire, type = "survival", se.fit = TRUE)

new_hire$prob <- preds$fit
new_hire$lcl <- preds$fit - 1.96*preds$se.fit
new_hire$ucl <- preds$fit + 1.96*preds$se.fit
new_hire

Our results show a survival probability of 38% for the 25-year old, and 17% percent for the 55-year old.

Conclusions

  • Cox regression can be used for any time-to-event data, not necessarily meaning death or disease.
  • Eight variables were said to have a higher influence on survival rate: age, industry, profession, traffic, greywage, way, selfcontrol, anxiety.
  • Study objective is to predict employee attrition to mitigate loss to the company.
  • Companies can provide benefits or stipends (such as bus passes, in this case) to try and retain employees. Specific industries can be focused on to improve retention.

Modeling of Survival after Chemotherapy for Colon Cancer

By Hermela Shimelis

Introduction

  • The colon cancer data is a built-in data set in the Survival R package [6]

  • Data set includes 929 subjects with stage B/C colon cancer who were randomized to three treatment groups then followed for 8-years.

    • Observation, Levamisole (Lev), Levamisole + 5-FU
  • The time to death or progression is given in days. The data set is filtered to evaluate time to death

  • Objective: Model the relationship between survival time and treatment

Predictors

Category Variables
Treatment Observation (no treatment)
Amisole (Lev)
Amisole + 5-FU
Patient Characteristics Age
Sex
Tumor Characteristics Colon perforation and obstruction
Adherence to nearby organs
Tumor differentiation
Local spread
Differentiation of tumor (well, moderate, poor)
More than 4 positive lymph nodes
Other Time from surgery to registration

Patient characteristics are similar between the three groups

Observation (%) Amisole (%) Amisole + 5-FU (%)
N=315 N=310 N=304
Demographics
Male 166 (52.3) 177 (57.1) 141
Median age (years) [IQR] 60 [53,68] 61 [53,69] 61 [52,70]
Cancer characteristics
Colon obstruction 63 (20.0) 63 (20.3) 54 (17.8)
Colon perforation 9 (2.9) 10 (3.2) 8 (2.6)
Adherence to nearby organs 47 (14.9) 49 (15.8) 39 (12.8)
Differentiation of tumor
Well 27 (8.6) 37 (11.9) 29 (9.5)
Moderate 236 (74.9) 229 (73.9) 221 (72.7)
Poor 52 (16.5) 44 (14.2) 54 (17.8)
Extent of local spread
Contiguous 20 (6.3) 12 (3.9) 11 (3.6)
Muscle 38 (12.1) 36 (11.6) 32 (10.5)
Serosa 249 (79.0) 259 (83.5) 251 (82.6)
Submucosa 8 (2.5) 3 (1.0) 10 (3.3)
More than 4 lymph nodes with cancer Yes 87 (27.6) 89 (28.7) 79 (26.0)
Short time from surgery to registration (%) Yes 91 (28.9) 80 (25.8) 76 (25.0)

Kaplan-Meier Curve Stratified by Treatment Groups

Show Code
fit <- survfit(Surv(time,status) ~ rx, data = colon_surv)
ggsurvplot(fit, data=colon_surv, risk.table = TRUE)

Cox Regression Models

Show Code
m0 <- coxph(Surv(time, status) ~ 1, data = df)
summary_m0 = summary(m0)
c_index_m0 <- concordance(m0)


cat("Concordance of the base model:",c_index_m0$concordance)
Concordance of the base model: 0.5
Show Code
Model_2 <- coxph(Surv(time, status) ~ rx, data = df)
Characteristic HR1 95% CI1 p-value
rx


    Obs
    Lev 0.97 0.78, 1.21 0.8
    Lev+5FU 0.69 0.55, 0.87 0.002
Concordance = 0.536
1 HR = Hazard Ratio, CI = Confidence Interval
Show Code
Model_3 <- coxph(Surv(time, status) ~ rx+ age + sex + perfor + adhere + surg + obstruct + differentiation + node4+ local_spread, data = df)
Characteristic HR1 95% CI1 p-value
rx


    Obs
    Lev 0.98 0.79, 1.22 0.9
    Lev+5FU 0.69 0.54, 0.87 0.002
age 1.01 1.00, 1.02 0.083
sex 1.04 0.86, 1.26 0.7
perfor 1.00 0.59, 1.70 >0.9
adhere 1.18 0.92, 1.53 0.2
surg 1.27 1.03, 1.55 0.022
obstruct 1.33 1.06, 1.68 0.015
differentiation


    moderate
    poor 1.43 1.13, 1.82 0.003
    well 1.08 0.78, 1.50 0.6
node4 2.55 2.10, 3.09 <0.001
local_spread


    contiguous
    muscle 0.39 0.23, 0.64 <0.001
    serosa 0.64 0.43, 0.94 0.023
    submucosa 0.29 0.10, 0.83 0.021
Concordance = 0.674
1 HR = Hazard Ratio, CI = Confidence Interval
Show Code
Model_4 <- coxph(Surv(time, status) ~ rx + age + surg + obstruct + 
    differentiation + node4 + local_spread, data = df)
Characteristic HR1 95% CI1 p-value
rx


    Obs
    Lev 0.99 0.80, 1.23 >0.9
    Lev+5FU 0.69 0.54, 0.87 0.002
age 1.01 1.00, 1.02 0.069
surg 1.28 1.04, 1.56 0.018
obstruct 1.33 1.06, 1.67 0.015
differentiation


    moderate
    poor 1.45 1.15, 1.84 0.002
    well 1.07 0.77, 1.48 0.7
node4 2.53 2.09, 3.07 <0.001
local_spread


    contiguous
    muscle 0.37 0.23, 0.61 <0.001
    serosa 0.61 0.41, 0.89 0.010
    submucosa 0.27 0.09, 0.76 0.014
Concordance = 0.672
1 HR = Hazard Ratio, CI = Confidence Interval
Schoenfeld Residuals Test Results
chisq df p Variable
rx 2.335 2 0.311 rx
age 0.549 1 0.459 age
surg 0.020 1 0.888 surg
obstruct 6.148 1 0.013 obstruct
differentiation 17.459 2 0.000 differentiation
node4 5.662 1 0.017 node4
local_spread 7.083 3 0.069 local_spread
GLOBAL 37.525 11 0.000 GLOBAL
Show Code
Model_5 <- coxph(Surv(time, status) ~ rx + age + surg + strata(obstruct) + strata(differentiation) + node4 +
              local_spread, data = df)
Characteristic HR1 95% CI1 p-value
rx


    Obs
    Lev 0.98 0.79, 1.22 0.9
    Lev+5FU 0.71 0.56, 0.89 0.003
age 1.01 1.00, 1.02 0.034
surg 1.30 1.06, 1.59 0.012
node4 2.50 2.06, 3.04 <0.001
local_spread


    contiguous
    muscle 0.34 0.21, 0.56 <0.001
    serosa 0.58 0.39, 0.84 0.004
    submucosa 0.24 0.08, 0.67 0.007
Concordance = 0.674
1 HR = Hazard Ratio, CI = Confidence Interval

Stratified Model Meets Proportional Hazards Assumption

Schoenfeld Residuals Test Results
chisq df p Variable
rx 2.001 2 0.368 rx
age 0.670 1 0.413 age
surg 0.014 1 0.905 surg
node4 4.288 1 0.038 node4
local_spread 5.298 3 0.151 local_spread
GLOBAL 12.411 8 0.134 GLOBAL

Model Evaluation Metrics

Model Description AIC BIC C_Index
Model 1 Base model 5860.383 5860.383 0.500
Model 2 Treatment 5852.236 5860.463 0.536
Model 3 Full variables 5741.401 5798.993 0.674
Model 4 Stepwise-selected variables 5737.261 5782.511 0.672
Model 5 Stratified 4567.829 4600.739 0.674

K-fold Cross Validation

Original c-index: 0.65 
Mean cross-validated c-Index: 0.64 

Conclusions

  • Treatment with Levamisole + 5-FU decreases the hazard of death from colon cancer by 29.5% (HR =0.71, 95% CI: 0.56-0.89; p=0.0035).
  • Having more than 4 tumor positive lymph nodes significantly increases the hazard of death by 150.3% (p <0.0001).
  • Having a long wait period from surgery to registration for trial is associated with an increase in the hazard by 29.5% (p=0.01).
  • Patients with local tumor spread to the submucosa, muscle and serosa have a reduction in the hazard by 76% (p=.0007), 66% (<0.001), and 43% (0.004), respectively, compared to those with contiguous organ spread.
  • The concordance of the model (0.67) indicates moderate predictive accuracy for survival time.
  • Other variables that were not included in the study may contribute to survival time.

Supplementary Data

Predictors

id: id
study: 1 for all patients
rx: Treatment - Obs(ervation), Lev(amisole), Lev(amisole)+5-FU
sex: 1=male
age: in years
obstruct: obstruction of colon by tumour
perfor: perforation of colon
adhere: adherence to nearby organs
nodes: number of lymph nodes with detectable cancer
time: days until event or censoring
status: censoring status
differ: differentiation of tumour (1=well, 2=moderate, 3=poor)
extent: Extent of local spread (1=submucosa, 2=muscle, 3=serosa, 4=contiguous structures)
surg: time from surgery to registration (0=short, 1=long)
node4: more than 4 positive lymph nodes
etype: event type: 1=recurrence,2=death

References

[1]
S. J. Walters, “Analyzing time to event outcomes with a cox regression model,” Wiley Interdiscip. Rev. Comput. Stat., vol. 4, no. 3, pp. 310–315, May 2012.
[2]
C. A. Bellera, G. MacGrogan, M. Debled, C. T. de Lara, V. Brouste, and S. Mathoulin-Pélissier, “Variables with time-varying effects and the cox model: Some statistical concepts illustrated with a prognostic factor study in breast cancer,” BMC Med. Res. Methodol., vol. 10, no. 1, p. 20, Mar. 2010.
[3]
S. Abd ElHafeez, G. D’Arrigo, D. Leonardis, M. Fusaro, G. Tripepi, and S. Roumeliotis, “Methods to analyze time-to-event data: The cox regression analysis,” Oxid. Med. Cell. Longev., vol. 2021, no. 1, p. 1302811, Nov. 2021.
[4]
Z. Zhang, J. Reinikainen, K. A. Adeleke, M. E. Pieterse, and C. G. M. Groothuis-Oudshoorn, “Time-varying covariates and coefficients in cox regression models,” Ann. Transl. Med., vol. 6, no. 7, pp. 121–121, Apr. 2018.
[5]
Terry M. Therneau and Patricia M. Grambsch, Modeling survival data: Extending the Cox model. New York: Springer, 2000.
[6]
T. M. Therneau, A package for survival analysis in r. 2024. Available: https://CRAN.R-project.org/package=survival